Skip to content

Phase 6.2: async outbound connect — eliminate 3s vCPU stall#78

Merged
dpsoft merged 10 commits intomainfrom
phase6.2-async-connect-rebased
May 6, 2026
Merged

Phase 6.2: async outbound connect — eliminate 3s vCPU stall#78
dpsoft merged 10 commits intomainfrom
phase6.2-async-connect-rebased

Conversation

@dpsoft
Copy link
Copy Markdown
Contributor

@dpsoft dpsoft commented May 6, 2026

What this branch does

Replaces the synchronous TcpStream::connect_timeout(addr, 3s) on the vCPU thread with a non-blocking connect + EPOLLOUT-driven completion on the net-poll thread. The vCPU thread is never blocked on connect again.

Severity: Medium-High — today, a guest opening a connection to ONE unreachable destination freezes ALL guest networking for up to 3 seconds (the connect_timeout). DNS misconfigurations, transient NAT failures, or one slow destination among many freeze the whole pipeline.

Headline win

Workload Before After
vCPU thread blocked on connect_timeout up to 3 s < 100 µs
Other flows during a stuck connect also blocked unaffected

The new BROKEN_ON_PURPOSE pin tcp_connect_to_unreachable_does_not_block_other_flows flips at the EPOLLOUT-completion commit (91947a3feat(slirp): EPOLLOUT-driven async connect completion).

Architecture

  • New TcpNatState::Connecting state.
  • Guest SYN → socket2::Socket::new(IPV4, STREAM.nonblocking, TCP)connect() returns EINPROGRESS → insert flow with state = Connecting, register FD with RegisterMode::Write → return immediately to vCPU.
  • Net-poll thread sees EPOLLOUT readiness → relay_pending_connects checks getsockopt(SO_ERROR):
    • zero → transition to SynReceived, send SYN-ACK to guest, modify epoll WriteRead.
    • non-zero → send RST to guest, reap.
  • CONNECT_TIMEOUT (3 s) reaping for stuck Connecting flows (silent firewall drop) — uses Phase 6.1's last_state_change field.
  • New EpollDispatch::modify (EPOLL_CTL_MOD) flips WriteRead on connect completion.

Bench evidence

scripts/bench-compare.sh --baseline 47868f0 --skip-vm:

Bench Baseline HEAD Note
process_syn_during_pending_connects/0 12.8 µs new bench
process_syn_during_pending_connects/10 12.6 µs flat
process_syn_during_pending_connects/100 656 ns O(1) — cost doesn't scale
process_syn_during_pending_connects/1000 1.39 µs with backlog size
port_forward_accept_latency 50.1 ms 183 µs inherited from #77
poll_with_n_mixed_flows/999 304 µs 10.1 µs -96.7 % held
tcp_bulk_throughput_1mb 58.8 ms 58 ms parity

Wall-clock vs master

Metric Master This branch Δ
TCP g2h throughput 1885 Mbps 5630 Mbps +199 % (3.0×)
TCP bulk-g2h 1565 Mbps 4940 Mbps +216 % (3.2×)
TCP CRR p50 ~10 ms ~10.1 ms parity
TCP RR p50 2 µs 2 µs parity

(Master baseline measured pre-6.x; current main already includes Phase 6.4 epoll dispatch (#69), Phase 6.1 half-close (#76), and port-forward listener on epoll (#77), so this PR's incremental delta vs the new main is the async-connect win specifically — the headline wall-clock numbers are the cumulative phase-6 stack.)

Commits (10)

Cherry-picked clean from smoltcp-passt-port-phase6.2-async-connect onto current main:

  1. docs: Phase 6.2 detailed TDD plan — async outbound connect
  2. chore: add socket2 dep for non-blocking connect (Cargo.lock regenerated)
  3. feat(slirp): add TcpNatState::Connecting + guest_isn field
  4. test(network): pin tcp_connect_to_unreachable_does_not_block_other_flows (BROKEN_ON_PURPOSE)
  5. feat(slirp): non-blocking connect — Connecting state for in-flight handshakes
  6. feat(slirp): EPOLLOUT-driven async connect completion (relay_pending_connects) — flips the BROKEN_ON_PURPOSE pin
  7. test(network): pin tcp_connect_async_eventual_rst_on_failure
  8. feat(slirp): CONNECT_TIMEOUT reaping for stuck Connecting flows
  9. bench(network): process_syn_during_pending_connects (Phase 6.2 baseline)
  10. fix(bench): drop unused Ipv4Address import; qualify the one use site

(The Phase 6.2 empty-marker validation-gate commit was skipped during cherry-pick.)

Test plan

  • cargo fmt --all -- --check — clean
  • cargo clippy --workspace --all-targets --all-features -- -D warnings — clean
  • RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features — clean
  • cargo test --test network_baseline -- --test-threads=1 — 22/22 (was 20; +2 connect pins)
  • cargo test --test network_baseline --features bench-helpers -- --test-threads=1 — 24/24 stable across 4/5 runs (1/5 hits the pre-existing tcp_port_forward_inbound_connect_succeeds parallel-bind flake unrelated to this change)
  • CI

Stacked follow-up

Phase 6.3 (TCP window management) will rebase onto post-6.2 main next.

Replaces draft #74 (same async-connect content via the now-superseded #73 chain).

dpsoft added 10 commits May 5, 2026 22:10
9 bite-sized tasks covering the TcpStream::connect_timeout(3s)
removal from the vCPU TX path:

- New TcpNatState::Connecting state.
- Non-blocking socket via socket2 + EINPROGRESS handling.
- EPOLLOUT-driven completion in relay_pending_connects, called
  from drain_to_guest before relay_tcp_nat_data.
- getsockopt(SO_ERROR) checks the actual connect outcome on
  EPOLLOUT readiness.
- EpollDispatch::modify (EPOLL_CTL_MOD) flips Write→Read on
  successful connect.
- CONNECT_TIMEOUT (3s) reaping for stuck Connecting flows
  (silent firewall drop).
- Two new pins: connect-to-unreachable-doesn't-block-others
  (BROKEN_ON_PURPOSE → flips at Task 5) + async-RST-on-failure.
- One new bench: process_syn_during_pending_connects parametric
  on N pending connecting flows (O(1) regression gate).

Severity: MEDIUM-HIGH. Today TcpStream::connect_timeout(addr, 3s)
on the vCPU thread freezes ALL guest networking for up to 3s
when one destination is slow/unreachable.
…ndshakes

Replace the synchronous TcpStream::connect_timeout(3s) on the vCPU thread
with a non-blocking socket2 connect that returns EINPROGRESS immediately.
Flows are inserted with TcpNatState::Connecting and their fd registered for
EPOLLOUT. EPOLLOUT-driven completion (Task 5: relay_pending_connects) will
promote them to SynReceived and send SYN-ACK.  An unreachable destination
no longer blocks all other guest networking for 3 seconds.
…connects)

Add EpollDispatch::modify (EPOLL_CTL_MOD) to atomically switch a registered
fd's event interest from Write to Read without a DEL+ADD window. Add
relay_pending_connects, called from drain_to_guest before relay_tcp_nat_data,
which drives all pending Connecting flows: checks SO_ERROR, sends SYN-ACK and
transitions to SynReceived on success, or RST and Closed on failure. Update
rebuild_epoll_from_flow_table to reap Connecting entries post-snapshot (the
underlying socket fd is dead after restore). The BROKEN_ON_PURPOSE pin
tcp_connect_to_unreachable_does_not_block_other_flows now passes.
Verifies that connecting to a recently-dropped listener port eventually
delivers a RST to the guest via relay_pending_connects's SO_ERROR path.
Already passes after Task 5 lands; pinned now to guard the behavior.
Add Connecting-timeout detection to relay_tcp_nat_data's timeout sweep.
Flows stuck in Connecting for longer than CONNECT_TIMEOUT (3 s — matching
the pre-Phase-6.2 synchronous connect_timeout behavior) are reaped: a RST
is sent to the guest and the flow table entry is removed. This handles the
silent-firewall-drop case where EPOLLOUT never fires.
Add insert_synthetic_connecting_entry bench-helper to SlirpBackend and
add the process_syn_during_pending_connects parametric bench (args: 0, 10,
100, 1000 pending connects). Validates that the SYN-handler cost is O(1)
in pending-connect backlog size — only flow_table.insert + epoll.register,
both O(1).
The import was only consumed at the bench-helpers-gated
process_syn_during_pending_connects bench (Task 8). Default-feature
clippy --bench network failed with -D warnings because the import
is unused when bench-helpers is off.

Quickest fix: qualify the single bare-name use as
smoltcp::wire::Ipv4Address (matches the other call sites in the
file) and drop Ipv4Address from the use list.
@dpsoft dpsoft merged commit 2e95099 into main May 6, 2026
37 of 39 checks passed
@dpsoft dpsoft deleted the phase6.2-async-connect-rebased branch May 6, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant